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Abstract 

In this extended abstract we provide a unifying frame-work that can be used to char- 
acterize and compare the expressive po-wer of query languages for different data base 
models. The frame-work is based upon the ne-w idea of valid partition, that is a partition 
of the elements of a given data base, -where each class of the partition is composed by 
t/5 elements that cannot be separated (distinguished) according to some level of information 

I I contained in the data base. We describe t-wo applications of this ne-w frame-work, first 

by deriving a ne-w syntactic characterization of the expressive po-wer of relational algebra 
^~~^ -which is equivalent to the one given by Paredaens, and subsequently by studying the 

J^ expressive po-wer of a simple graph-based data model. 

cn 

t:j- 1 Introduction 

en 

(^ The relational data base model, introduced by Codd in [7], has been particularly successful 

CN since it is a mathematically elegant model -well suited to describe almost all "real -world" 

. . situations. Since the query languages associated to such model (the relational algebra and 

_^ the relational calculus) have a formal and simple definition, an interesting field of research 

S^ is to study the expressive power of such language. Codd [8] has proved that the relational 

H algebra is equivalent to the relational calculus, in the sense that both query languages can 

compute the same set of relations. 

A breakthrough in this field |3l[l2] has been a syntactic characterization of the set of rela- 
tions that can be computed in a give data base. These results, also kno-wn as BP-conipleteness, 
are based on the principle of data independency from the physical representation: the infor- 
mation that can be extracted from the data base is completely determined at the logical level 
of such data base. This fact can be stated in a simple -way: a relation R can be computed 
from a data base D if and only if all permutations over the elements of D which preserve D 
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(that is, all permutations that produce a data base isomorphic to D), also preserve R. An 
interesting interpretation of this property is that only the information given by the structure 
of the data can be used to differentiate data values; consequently, a query is expressible if 
and only if it does not add any additional differentiation to the one initially available [1] . 

This idea can be rephrased by stating that the result of a query is invariant w.r.t. per- 
mutations of indistinguishable values; such a permutation was captured with the notion of 
automorphism in [11|T2]. While the i?P-criterion is a natural requirement, it refers to prop- 
erties of relations in a given data base instead of queries as a whole. We recall that a query 
is an expression of the query language that can be applied to different data bases leading 
to possibly different results. Thus it has been extended to a property of queries as partial 
functions from data bases to data bases, which is known nowadays as genericity jB]: it has 
been recognized as the capability of the calculus to preserve isomorphisms between data 
bases, rather than automorphisms. Genericity is a common requirement for query languages 
and it is traditionally related to the data independence principle that assumes that the data 
base is constructed over an abstract domain which is independent from the internal repre- 
sentation of data. Subsequent research has shown that this approach to the analysis of the 
expressiveness of a query language has certain shortcomings [U [10] , mainly when new data 
models, such as the object-based model, are introduced. Other notions have been proposed 
to analyze properties of queries in some new models [5l [131 [2] pointing out the importance 
of extending genericity to be used in more complex models. In [5] languages are classified 
w.r.t. the degree of the use of the equality predicate, by analyzing the invariance property of 
queries under different mappings (not necessarily isomorphisms) over the data domain, which 
are compatible with the relational structure of the data base. 

Subsequent advances in data base theory have led to different models that take into 
account the limitations of the relational model when it comes to describe complex situations. 
Most of such models have been introduced in the graph-based or object-oriented frameworks, 
but usually their mathematical foundations do not allow a complete study of the expressive 
power of the query languages introduced. In fact, to our knowledge, the only exception is 
the graph-based model GOOD [3]. 

In this paper we introduce a different syntactic characterization of queries computable in 
a data base. Our characterization relies upon the notion of partitions of the domain, where 
each partition represents a level of undifferentiation among objects, values or vertices. Notice 
that an automorphism also can represent a certain level of undifferentiation. Initially we will 
exploit such notion to give two new characterizations of relations expressible in a relational 
data base. Subsequently, we will show how to apply the new framework to analyze a simple 
graph-based model, hence proving that our characterization can be useful in comparing the 
expressive power of different data languages. 

Following the approach of [l2], the data models studied in this paper are domain-preserving, 
that is, it is not possible to create new vertices or values, but only to query an existing data 
base. In our framework, a binary relation over sets of data values is defined, denoted by M-, 
which relates those sets of values that cannot be differentiated. From the relation ^-)- we build 
some sets of partitions that respect M-, that is, all classes in a partition are preserved by ^-)-. 
We prove that expressiveness of a query language can be stated as the conservation of some 
of those partitions, where the exact set of partitions that must be preserved depends on the 
data model. The expressibility results we obtain have the following form: Given a data base 



D, let S be a relation or a graph over the domain set of D. Then S can be expressed in D 
if and only if P(-D) = P(D U {5"}), where P(D) and P(Z) U {S"}) are two sets of partitions 
which depend on the model under consideration. 

2 Preliminaries 

All sets considered in this paper are assumed to be finite and nonempty. Given a set C/, a 
relation R over [/ is a subset of the cartesian product 17°" = U x U x ■ ■ ■ x U {a times) for 
some fixed integer a > 0, that is a set of tuples of length a, where all components of a tuple 
are elements of U. The number a G N is called the order or arity of the relation. Given a set 
TZ = {Ri,R2, ■ ■ ■ ,Rp} of relations over U, the pair {U,TZ) is called a relational database; in 
this setting, U is the domain of the database, and TZ is the set of relations of the database. 

Given a relation R £ TZ oi a database {U,TZ), we denote with D{R) the data domain of 
R, that is the subset of the elements of the database domain U that are in at least one tuple 
of R. The notion of data domain is easily extended to the set TZ of relations as the set union 
of relations' data domains: D(TZ) = \Jj^^j^D{R). Without loss of generality, we can assume 
that D{TZ) = U for every considered database {U,TZ). This seemingly trivial requirement is 



indeed very important, as it will become evident after Theorem 3.2, therefore we will omit 
the universe set unless it is necessary to avoid any ambiguities. 

Just as in [12], when referring to a relational database, we use the relational algebra as 
a query language. In relational algebra two binary operators (union and product) and three 
unary operators (projection, equality restriction and inequality restriction) are given. In the 
following definition all relations are defined over the same database domain U. 

Definition 2.1 (Relational Algebra). Let R and S be two relations with the same arity; the 
union of R and S, denoted by RU S, is simply the set-theoretical union of the two sets of 
tuples. 

Given two relations R and S (not necessarily with the same arity), the (cartesian) product 
of R and S, denoted by R x S, is the set of all possible concatenations of a tuple of R with 
a tuple of S: {r ■ s\r € R, s £ S}. The abbreviation R is used to express the relation 
Rx ■ ■ ■ X R (k times). 

Let m be the arity of a relation R, q < m a positive integer and f : {1, . . . ,q} — )• {!,..., m} 
a function. T/ie projection o/i? over (/(I), ..., /(g)), denoted by Rn (^f (1) ,..., f (q)^ , is the 
relation: {(rj(i), . . . , rj(g)) : {ri, . . . ,rm) £ R)}- 

Now, let ji and J2 be two integers such that 1 < ji,J2 < 'rn, where m is the arity of a 
relation R. The equality restriction of R on ji and J2 is the relation, denoted by R\ji = J2, 
that is obtained by taking from R all the tuples for which the ji-th and the J2-th components 
are equal: {(ri,...,rm) G R : rj-^ = rj^}. Analogously, the inequality restriction of R on 
j'l and J2, denoted by R\ ji / J2, is the relation obtained by taking from R all the tuples for 
which the ji-th and the J2-th components are different: |(ri, . . . ,rm) G R ■ rj^ ^ '"j^}- 

The five operations just described are sufficient to generate the operations of intersection, 
difference, join and division, usually assumed as primitives in Codd's relational algebra; a 
proof of this fact can be found, for example, in ^8j. 

Given a relational data base D = {U,TZ), we will denote by Me{D) the relation which 
is the result of applying the expression (of the relational algebra) E to the data base D. 



Moreover a relation S over U is told to be expressible from TZ if there exists an expression E 
whose operands are all relations in 7^, and such that Me{D) is equal to S. Following [T^], we 
denote with BI(7^) (basic information contained in the set of relations TZ) the set of relations 
that can be expressed from TZ. 

As observed in [l2j, BI(7^) is the set of the answers to all possible queries that can be 
asked to a relational datMabase that contains the relations TZ. In [12], Paredaens gives a char- 
acterization of the class BI(7^) based upon appropriate automorphisms, that is permutations 
of the elements of the database domain. 

Let Rhe a relation of order m over a set U. As in [12j, an automorphism is a bijective 
function (that is, a permutation) on U. We say that the automorphism ip : U ^ U respects 
the relation R or, equivalently, that tp is R-compatihle if, for each tuple (ai, 02, • • • , am) £ W^, 
(01, 02,..., am) G R^ (V'(ai),V'(a2),...,V'(am)) € R. 

The compatibility of an automorphism ip : U ^ U with respect to a relation R can be 
naturally extended to a set 7^ of relations in the following way: V respects the relations in TZ 
or, equivalently, tp is TZ- compatible if tjj is i?-compatible for each relation R in TZ. Notice that 
the set of automorphisms TZ- compatible, is a groujjj where the operation is the composition 
of functions and the identity is the identity function (i.e. the function defined as f{x) = x). 
As in |12] . we denote with Aut(7^) the set of all the automorphisms ip -.U ^ U which are TZ- 
compatible; with a small abuse of notation, if 7^ = {i?}, we will usually write Aut(i?) instead 
of Aut({i2}). It will be very useful to consider the following representation of Aut(7^). 

Definition 2.2. Let {U,TZ) be a relational database, with U = {di,d2, . . ., dn}, and let 
Aut(7^) = {ipi, ip2-, ■ • • , i^i} be the set of TZ- compatible automorphisms. The following relation 
of arity n: 

ipiidi) ••• ipiidn) 
cgr(7^) = : • ■ . : 

Tplidl) ■■■ Iplidn) 

is called the cogroup-relation of {U,TZ). 

As we can see, each row (tuple) of the relation cgr(7^) represents one of the 7^-compatible 
automorphisms. Since we do not associate any particular meaning to the elements of the 
domain [/, if |C/| = n we can assume, without loss of generality, U = {1, 2, . . . ,n}. We can 
also assume that the first tuple of cgr(7^) represents the identity function on U (which is 
always present in Aut(7^), since it is compatible with every nonempty set of relations); as a 
consequence, it can always be assumed that the first row of cgr(7^) is the tuple (1, 2, . . . , n). 

Example 2.1. Let {U,TZ) be a relational data base, with: 

• C/ = {1,2,3,4} 

• TZ = {Ri,R2, R3}, with: 
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^a group consists of a set G of elements, a binary associative operation on G, and an identity element 
1g £ G, such that the operation is closed and invertible in G 



It is easily verified that: 

Aut({i?i, i?2}) = Aut({i?i, fls}) = Aut({ii2, R3}) = Aut({i?i, R2, R3}) 

12 3 4 

r-v^ 2 14 3 

'^'^^^= 3 4 12 

4 3 2 1 

If we look at the TZ-compatible automorphisms as permutations over U, we can express Aut(7^) 
as follows: 

Identity 

(1 2) (3 4) 



Aut(7^) 



(1 3) (2 4) 
(1 4) (2 3) 



It is not difficult to see that, for a given database {U,TZ), the set Aut(7^) of 7^-compatible 
automorphisms is indeed a group with respect to function composition, with the identity 
function over U as unitary element. In fact, the identity over U is always in Aut(7^), the 
inverse of an 7^-compatible automorphism is still an 7^-compatible automorphism, and the 
composition between two 7^-compatible automorphisms is again an 7^-compatible automor- 
phism. Since we can always assume U = {1,2, . . . ,n}, we can think of Aut(7^) as a finite 
permutation group over the set {1, 2, . . . , n}, that is a subgroup of the symmetric group Sn- 

In this paper we investigate the relation between expressive power and partitions of the 
database domain. More precisely, we investigate the possibility to characterize the expressive 
power of relational and graph-based databases via one or more theorems abiding to the 
following meta theorem. 

Theorem 2.1 (Meta theorem). Let {U,TZ) be a relational database, and let S be a relation 
overU. ThenSeBl{n) ^^ P{n) = F{nu {S}), where F{n) and F{nu{S}) are sets of 
partitions over U , built from the sets IZ and TZU {S} of relations respectively. 

3 Expressiveness in Relational Databases 

The relevance of the main result in [12] is that it is the first syntactic characterization of the 
relations that can be obtained from a given database (C/, IZ) when the relational algebra is 
used as a query language. More precisely, in [12] the following theorem is proved. 



Theorem 3.1. Let {U,TZ) be a relational database, and let S be a relation over U. Then 
S G BI(7^) ^^ Aut(7^) C Aut(5) and D{S) C D(7^). 

Basically, Paredaens has been able to point out the fundamental relation between expres- 
siveness in a database and the set of automorphisms in the relational model. Such result 
has been successively extended in [6] to define in a formal way the notion of genericity, that 
is computable queries [6] have to be invariant with respect to the isomorphisms between 
databases. We can restate Theorem 13.11 in a form that will be more convenient for our 
purposes. 



Theorem 3.2. Let {U, IZ) he a relational database, and let S he a relation over U. Then 
SeBl{TZ) ^^ Aut(7^) = Aut(7^U{5}). 

Proof. First of all, we show that S G Bl{n) ^^ BI(7^) = BI(7^ U {S}). Proving that 
S G BI(7^) =^ BI(7^) = BI(7^ U {5}) is trivial as BI(7^) C BI(7^ U {5}). The latter stems 
from the fact that the relations which are expressible from TZ are those obtained from TZL){S} 
simply ignoring the relation S. Let now be S" G BI(7^) and T G BI(7^U{5}). If the expression 
that gives T from TZ U {S} does contain some occurrence of the relation S, it is sufficient to 
replace such occurrence with the expression that gives S from 7?. to conclude that T £ BI{Tl), 
and thus BI(7^U {S*}) C BI(7l). It is immediate to notice that BI(7^) = BI(7^U {S'}) implies 
S eBl{TZU{S}). 

Since we have established that 5 G BI(7^) <;=^ BI(7^) = Bl{TZU{S}), the two databases 
{U, TV) and {U, TZ U {S}) are basic information equivalent - that is, every relation of the 
first database can be obtained from the relations of the second database and vice versa - if 



and only if S is expressible from {U,TZ). A direct consequence of Theorem 3.1 is that two 
databases {U,TZi) and ([/, 7^2) are basic information equivalent if and only if D{TZi) = D{TZ2) 
(which are assumed to be both equal to U) and Aut(7^i) = Aut(7^2); thus, we can conclude 
that 5 G BI(7^) ^^ Aut(7^) = Aut(7^ U {S}) as stated. D 



We observe that, given our assumption that U = D{TZ), in Theorem 3.2 we can get rid 
of the inclusion between the domains, since it is implicit from the fact that S* is a relation 
over U. On the other hand, we cannot ignore the inclusion condition if we suppose that 
D{1Z) C [/, since in such a situation it is not difficult to show two relations R and S such 
that Aut(i2) = Autdi?, S}) but S Bl{R). 

A notion that seems tightly related to the expressiveness of relations in a database is 
that of indistinguishability between elements of the domain. Intuitively, the idea is that the 
elements of a subset of the domain of a given database are indistinguishable if and only if no 
query to the database is able to divide the set in two parts, one made of the elements that 
occur in the relation resulting from the query and the other made of the elements that do not 
occur in the relation. In such a situation, we say that the set of indistinguishable elements 
cannot be separated by any of the queries that can be presented to the database. Thus, a 
relation resulting from a query to the database can only contain all or none of the elements 
of a non-separable set. 



Theorem 2.1 defines the general framework we propose to investigate the expressive power 
of query languages. In this framework different notions of expressible queries can be studied 
by considering different sets of partitions. For a given database {U,TZ), we say that a set 



P(7^) of partitions of C/ is a set of valid partitions if and only if it satisfies Theorem 2.1 By 
the results in p^ , it seems to us quite natural to define the following sets of valid partitions, 
namely the orbit partitions and the cycle partitions; indeed later we will be able to prove 



that, in the context of Theorem 2.1, they are equivalent to the characterization of relations 



obtainable in a relational data base of il2l. 



Definition 3.1. Let {U,TZ) be a relational database, and let V = {Pi,P2, ...,Pk} be a 
partition ofU. V is an orbit partition ofU with respect to TZ if both the following conditions 
hold: 

1. for each relation R £ TZ and for each class Pi G V, PiD D{R) = or Pi CL D{R); 



2. for each class Pi £ V and for each pair 01,02 of elements of Pi there exists an auto- 
morphism (j) G Aut(7^) such that (/>(ai) = 02, and 4>{Pj) = Pj for every class Pj £ V. 

We denote with OP(TZ) the set of all orbit partitions of the given database {U,TZ). 

Definition 3.2. Let {U,TZ) be a relational database, and let V = {Pi,P2, ...,-Pfc} be a 
partition ofU. V is a cycle partition of U with respect to IZ if both the following conditions 
hold: 

1. for each relation R £ TZ and for each class Pi £ V, Pj Pi D{R) =0 or Pi Q D(R); 

2. there exists an automorphism (j) € Aut(7^) such that for each class Pi £ V and for 
each pair ai,a2 of elements of Pi there exists an integer n such that (j)^{ai) = 02 and 
(j){Pj) = Pj for every class Pj £V . 

We denote with CP(7^) the set of all cycle partitions of the given database {U,TZ). 

As already stated for Aut(7^), if i? is a relation we will write OP(i?) and CP(i?) instead 
of OP({i?}) and CP({i?}) respectively. 

The following theorem is an alternative formulation of the main result of P^ (the equiva- 



lence of the two formulation follows from Theorem 3.2 ) which is more useful for our purposes. 



Theorem 3.3. Let {U, IZ) be a relational database, and let S be a relation over U . Then 

Aut(7^) = Aut(7^u{5}) ^^ p(7^) = p(7^u{5}). 

Let {U,1Z) be a relational database, and let Aut(7^) and cgr(7^) be respectively the group 
of 7^-compatible automorphisms and the cogroup-relation of IZ. A useful fact proved in [12] 
is that the cogroup-relation is expressible from IZ, that is cgr(7^) £ BI(7^). Using this fact, 
we are able to prove the following theorem. 

Theorem 3.4. Aut(7^) = Aut(cgr(7^)). 



Proof. Since cgr(7^) £ BI(7^), by Theorem 3.1 we can conclude that Aut(7^) C Aut(cgr(7^)). 



Now, let (j) £ Aut(cgr(7^)); as we have already observed, (/> is a permutation of the set 
U = D{TZ), as well as of the tuples that compose the relation cgr(7^). Thus, for each tuple 
t £ cgr{TZ), we have that (/)(t) £ cgr{TZ). In particular, by letting n be the cardinality of U, 
we have: 

(/.((I, 2, . . . , n)) = (</.(!), 0(2), . . . , 0(n)) £ cgi{TZ) 

Thus, the elements of U are mapped by (j) in such a way that the result is a row of the 
cogroup-relation; so we can conclude that £ Aut(7^). D 



A direct consequence of Theorem |3.4| is that not only cgr(7^) £ BI(7^), as established by 
Paredaens, but also R £ BI(cgr(7^)) for every relation R £ TZ, since D{R) C D{cgr{TZ)) = U 



and Aut(cgr(7^)) = Aut(7^) C Aut(i?). As a corollary of Theorem 3.4 if we are interested 
to study the expressive power of a given relational database {U, IZ) then we can work as well 
on the database (f/, {cgr(7^)}), which has only one relation and, moreover, such relation is 
an explicit representation of the finite permutation group Aut(7^). 

We now turn our attention to the structure of 0P(7^) and CP(7^). First of all we observe 



that, thanks to Theorem 3.4, we can get rid of item 1 in Definitions |3.1| and 3.2 since, by 



considering the database {U,{cgr:{TZ)}), there is only one relation and, for such relation, it 
holds Pi C D(cgr(7^)) = U for each Pi^V. 

To characterize the sets of cycle and orbit partitions we need to recall some notions from 
basic abstract algebra. 

Definition 3.3. Let X be a set and {G,-,e) a group. An action of G on X is a map 
* : G X X ^ X such that 

1. y X £ X, e* X = x; 

2- ^91,92 £ G, yx £ X {gi ■ g2) * x = gi * (52 * x) 

In group theory it is customary to omit the operators symbols from expressions when 
confusion does not arise; so, the expression in item 2 above is usually written as: {gig2)x = 
9i{92x). 

Definition 3.4. Let G be a group acting on a set X. For xi, X2 € X , let xi ~ X2 if and only 
if there exists g £ G such that gxi = X2- It is not difficult to see that ~ is an equivalence 
relation on X , and thus it induces a partition V on X . The classes ofV are called the orbits 
in X under G. If x £ X , the class containing x — denoted by Gx — is called the orbit of x 
under G. In other words, Gx = {y £ X \ y = gx for some g £ G}. 

It is not difficult to see that the partition induced by the orbits of Aut(7^) on U satisfies 



Definition 3.1 In fact, every automorphism (f> £ Aut(7^) maps each orbit Aut(7^)a; into itself 
and, given a pair oi, 02 of elements of U, there exists an automorphism that maps ai to 02 if 
and only if ai and 02 are in the same orbit. Moreover, if // is a subgroup of a group G acting 
on the set X, then every orbit Hx is a subset of the orbit Gx; more precisely, it is not difficult 
to prove that the orbits induced by H are a refinement of the orbits induced by G. Since 
each partition induced by the orbits of every subgroup of Aut(7^) satisfies Definition 3.1, we 
have that OP (7^) contains the set of those partitions. 

Vice versa, let V £ OP{TZ). It is not difficult to see that the set of automorphisms 
(p £ Aut(7^) that map each class of V into itself and that map each element of a class to an 
element of the same class forms a subgroup of Aut(7^); moreover, the orbit partition induced 
by such a subgroup is just P. As a consequence, OP (7^) is a subset of the set of partitions 
induced by all the subgroups of Aut(7^); since also the converse inclusion holds, the two sets 
indeed coincide. 

Definition 3.5. Let G be a group acting on the set X, and let g £ G. For xi,X2 £ X, 
let xi ~ X2 if and only if there exists an integer n such that X2 = g^xi, where 5" is the 
application of g for n times. It is not difficult to see that ~ is an equivalence relation on X, 
and thus it induces a partition V on X. The classes ofV are called the cycles of g on X. 

Analogously to what said about orbits, it is not difficult to see that the partitions induced 



by the cycles of the automorphisms of Aut(7?,) satisfy Definition 3.2, We observe that, while 
an orbit partition is induced by a subgroup of Aut(7^), a cycle partition is induced by an 
automorphism, that is by an element of Aut(7^). The class CP(7^) is thus the set of cycle 
partitions obtained by considering every element of Aut(7^). 



Definition 3.6. Let G be a group acting on the set X and let g be a permutation in G. Then 
the orbits of the (cyclic) group {g) generated by g are the cycles of g. Since (g) is a subgroup 
of G, we have immediately that every cycle partition of G is also an orbit partition of G, that 

is, cp(7^) cop(7^). 

Example |2.1| can be used to show that the converse does not generally hold: not every 
orbit partition is also a cycle partition. In fact we have: 

CP(7e) = {{{l},{2},{3},{4}},{{l,2},{3,4}}, 

{{1,3},{2,4}},{{1,4},{2,3}}} 
OP(7^) = CP(7^)U {{{1,2,3,4}}} 

As noted above. Theorem |3.4| allows us to deal only with cogroup-relations instead of sets 
of arbitrary relations. The same can be done when working with cycle and orbit partitions: 
since cycles and orbits that form the partitions in CP(7^) and OP (7^) are completely deter- 
mined from the elements and the subgroups of Aut(7^) respectively, by Theorem 3.4 we can 
conclude that CP(7^) = CP(cgr(7l)) and 0P(7l) = 0P(cgr(7^)). 

It is possible to show that both the set CP(7^) of cycle partitions and the set OP (7^) 
of orbit partitions of a given database {U,TZ) constitute a partially ordered set (poset) with 
respect to the binary relation < , where Vi < V2 iff each class of Vi is contained in some class 
of 7^2) where Vi and 1^2 are two partitions in P(7^), P(7^) is equal to CP(7^) or OP (7^). In 
fact, it is not difficult to see that < is reflexive, antisymmetric and transitive: that is, < is 
an order relation over both CP(7^) and OP (7^). One notably difference between the posets 
(OP (7^), <) and (CP(7^), <) is that the first has always a maximum element, corresponding 
to the orbits of the entire Aut(7^), while the second may not have a maximum element, as 
shown above referring to Example |2.1[ where Aut(7^) is the so called Klein group. Instead, 
both the posets have a minimum element, corresponding to the cycles (equal to the orbits) 
induced by the identity element of Aut(7^): that is, the trivial partition, where each class is 
a singleton. 

In order to prove our main results we need some definitions and some well known proper- 
ties of finite groups. Here we just recall the notion of stabilizer; we address the reader to an 
introductory book on abstract algebra, such as [9| , for the notion of coset and its properties. 

Definition 3.7. Let G be a group acting on a set X, and let x ^ X. The subgroup Gx of G 
defined as Gx = {g & G \ gx = x} is called the stabilizer of x in G. 

It is not difficult to see that if G is a group which acts on the set X, and x €z X, then 
the stabilizer Gx of x can be considered as a group which acts on the set X \ {x}. The 
following are two well known results in group theory: Lagrange's theorem - which correlates 
the cardinality of a given group G and the cardinality of a given subgroup H of G with the 
number of left cosets of G with respect to LI - and a theorem which expresses the cardinality 
of the orbit of G containing x as the number of left cosets of G with respect to the stabilizer 
Gx- 

Theorem 3.5 (Lagrange's Theorem). Let G be a finite group, and let H be a subgroup of G. 
Then \G\ = [G : H) ■ \H\, where {G : H) is the number of left cosets of G with respect to H, 
and is usually called the index of i/ in G. 
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Theorem 3.6. Let G be a finite group acting on a set X , and let x & X . Then \Gx\ = [G : 
Gx), that is there exists a one-to-one correspondence between the elements of the orbit Gx of 
X under G and the left cosets of the stabilizer Gx in G. 

We are now able to prove the following theorem. 

Theorem 3.7. Let G be a subgroup of the symmetric group Sn, and let H be a subgroup of 
G. If the orbit partitions of G and H are the same, then H = G. 

Proof. We prove the assertion by induction on n. For n < 2 the theorem can be proved by 
direct inspection of the subgroups of Sn- 

Now, let us suppose that the theorem is true for n — 1, and let us show that it holds also 
for n. We first observe that since the orbit partitions of G and H are the same, then also the 
orbits Gn and Hn of the element n with respect to G and H are the same. Now, if we take all 
the partitions having {n} as a class, we get the orbit partitions induced by the stabilizers Gn 
and Hn of the element n with respect to G and H. These orbit partitions are equal and thus, 
by induction hypothesis, Gn = Hn- By Lagrange's theorem, we can express the cardinalities 
of G and H with respect to the cardinalities of their stabilizers as |G| = (G : Gn) ■ \Gn\ 
and \H\ = {H : Hn) • \Hn\- where {G : Gn) and {H : Hn) are the indices, respectively, of 



the stabilizer Gn in G and of the stabilizer Hn in H. By Theorem 3.6 we can infer that 

|G| = \Gn\ - \Gn\ and \H\ = \Hn\ ■ \Hn\- Since \Gn\ = \Hn\ and |G„| = \Hn\, we can conclude 
that G and H have the same order, and thus G = H. D 



Theorem 3.7 allows us to show that the orbit partitions of a given database satisfy Theo- 
rem Scheme II; in fact, the following theorem provides a first characterization of expressible 
queries in relational databases alternative to the one originally given by Paredaens. 

Corollary 3.8. Let {U,TZ) be a relational database, and let S be a relation over U- Then 

Aut(7^) = Aut(7^u{5}) ^^ OF{n) = OP {nu{s}) 

Proof If Aut(7^) = Aut(7^U {S}), since the orbit partitions are completely determined from 
the subgroups of Aut(7^), we obtain that OF{n) = 0P(7^U {S'}). 

For the converse, we observe that Aut(7^) is a subgroup of the symmetric group Sn, and 
Aut(7^U {S}) is a subgroup of Aut(7^). By hypothesis, the orbit partitions of Aut(7^) and 



Aut(7^ U {S}) are equal and thus, by Theorem 3.7, Aut{n) = Aut(7l U {S}). D 



A second characterization of expressible queries in relational databases can be obtained 
by considering cycle partitions instead of orbit partitions. We need the following lemma. 

Lemma 3.1. Let G be a subgroup of the symmetric group Sn, and let H be a subgroup of 
G. If the cycle partitions of G and H are the same, then also the orbits of G and H are the 
same, that is Hx = Gx for every x £ {1,2, . . . ,n}. 

Proof. Since the orbit Gx is the set of elements of {1, 2, . . . , n} which are reachable from 
x through some element g of G, while a cycle containing x is the set of elements which 
are reachable from x through one element g of G, one method to build Gx from the cycle 
partitions of G is given by Algorithm [TJ 
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Algorithm 1: BuildOrbit 



Data: an integer x G {l,...,n}, a subgroup G of Sn, a set CP of cycle partitions 
Result: Result 

1 Result -^ {x}; 

2 repeat 

3 Modified -^ false; 

4 foreach partition V in CP do 

5 Cycles -^ the smallest union of cycles of V which covers Result; 

6 if Cycles \ Result ^ then 

7 Result -^ Result U Cycles; 

8 Modified <— true; 

9 end 

10 end 

11 until Modified = false; 



Algorithm [T] computes the least subset O of {1, 2, . . . , n} which contains x and such that, 
for every cycle partition V oi C, O is the union of some cycles in V; it is not difficult to see 
that O is, indeed, the orbit Cx. 

Since the cycle partitions of G and H are the same by hypothesis, the orbits computed by 
the algorithm above will be the same for G and H, for every choice of x G {1, 2, . . . , n}. D 

We are now ready to prove the following theorem. 

Theorem 3.9. Let G be a subgroup of the symmetric group Sn, and let H be a subgroup of 
G. If the cycle partitions of G and H are the same, then H = G. 



Proof. By Lemma 3.1 , the orbits of G and H are the same. Thus we can prove the theorem 



by the same argument used for Theorem 3.7 D 



A direct consequence of Theorem 3.9 is that the cycle partitions of a given database 
satisfy Theorem Schema II; thus, the following theorem provides a second characterization of 
expressible queries in relational databases alternative to the one originally given by Paredaens. 



The proof is analogous to the one given for Theorem 3.8 

Theorem 3.10. Let {U,TZ) be a relational database, and let S be a relation over U . Then: 
Aut(7^) = Aut(7^U{5}) ^^ CP(7^) = CP(7^U{5}) 



A final observation is due about Theorems 3.8 and 3.10 Even though there is a strong 
resemblance between our meta Theorem 1 2 . 1 1 and Theorem |3.2[ our results cannot be expressed 
neither in the form S G BI(7^) ^^ 0P(7l) C OP(S') and D{S) C D{n) nor in the form 
S G BI(7^) ^^ CP(7^) C CP(5') and D{S) C D{n), as shown in the next example. 

Example 3.1. Let {U,{R}) and {U,{S}) be two relational databases, with: 
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. C/ = {1,2,3,4,5} 




12 3 4 

2 3 4 5 

• R = 3 4 5 1 

4 5 12 


5 
1 
2 
3 



12 3 4 5 
2 3 5 14 
5=35421 
5 4 13 2 
51234 41253 

Notice that Aut{R) is the cyclic group generated by the permutation (12 3 4 5), while 
Aut(S') is the cyclic group generated by the permutation (1 2 3 5 4); 

Identity Identity 

(12 3 4 5) (12354) 

Aut{R) = (13 5 2 4) Aut(5) = (13425) 

(14 2 5 3) (15243) 

(15 4 3 2) (14532) 

From Aut{R) and Aut(5) we can easily obtain CF{R) = OF{R) = CP(5) = OP(S') = 
{{{l}i {2}> {3}, {4}, {5}} , {{1, 2,3, 4, 5}}}}. Clearly, S is not expressible from R, since we 
have D{R) = D{S) but Aut{R) % Aut(5); on the other hand, OF{R) C 0P(5) and 
D{S) C D{R), and CP{R) C CP(5) and D{S) C D{R). The fact that S is not express- 
ible from R can be correctly determined through orbit partitions or through cycle partitions 
by observing that: OP{{R,S}) = |{{1}, {2}, {3}, {4}, {5}}| / OF{R) orCP{{R,S}) = 
{{{l},{2},{3},{4},{5}}}/CP(i?). 

4 Expressiveness in graph-based data bases 

In this section we study a simple graph-based model where two labeled graphs are used to 
model data bases. A data base consists of two distinct layers: a schema layer and a structure 
layer; the objects can be found in the latter, while the former describe the data organization. 
Each layer is a labeled weakly-connected directed graph, moreover there exists a function 
that maps a schema into a structure: such function will be called an extension. Both vertices 
and edges of the graphs are labeled, and we can assume that the sets of edge labels and vertex 
labels, as well as schema labels and structure labels, are disjoint. An example of data base is 
represented in Figures [l| [2j from which it is easy to note how the schema and the structure 
are closely related, the following definitions only formalize the intuitive idea. 

Definition 4.1 (Schema). A schema graph, in short schema, is a triple S = (G, Ai,A2), 
where G = iy,E) is an oriented, weakly- connected graph, and Ai, A2 are respectively the 
injective functions that maps each node (resp. edge) to its label. 

Definition 4.2 (Structure). A structure is a triple S = {S, A'^, A2), with S a colored oriented 
graph S = {V,E,fi), where V is the set 0/ nodes of the structure, E (^ V x V is the set of 
edges, A'^, A2 are respectively the injective functions that maps each node (resp. edge) to its 
label, and fj, : E ^- T, is a labeling of the edges over the finite alphabet T, called coloring of 
the structure. 
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Figure 1 : Example of schema 

In the following, we will use the set F = {true, false} of colors that allows to specify that 
a link between object instances in S is actual or not. In the example of Fig. [2| only the links 
labeled true are represented, and the presence (or the abscence) of links labeled false does 
not change the data stored in the data base. In Fig. |3]is represented a part of the structure, 
where false links are represented with dotted arrows. 

The schema and the structure must be strongly correlated; in fact there must exist a 
function, called extension (denoted by Ext), mapping the schema into the structure. In 
order to have a sound definition of extension some restrictions must be enforced, as pointed 
out in the following definition, where Pow{A) stands for the family of all nonempty subsets 
of ^. 

Informally Ext maps each vertex of the schema into some vertices of the structure and 
each edge of the schema into some edges of the structure. 

Definition 4.3 (Extension). Let S = (G = {V,E),Xi,X2) be a schema and S = (5, A'^, A2) 
a structure, where S = {V',E',fj,). Then S is an extensional structure of T, if there is a 
function (the extension j from T, to S, Ext : V — )• Pow{V'), such that: 

1. {Ext{v) : V G V{G)} is a partition {Vi, ■ ■ ■ , Vn] of the set V , 

2. for every x ^ Vi,y ^ Vj, the pair (x, y) E E' iff {Ext~^{x), Ext^^{y)) € E; 

Notice that the first point of the definition of extension implies that the function Ext~^ is 
well defined. In the following, if S is the extensional structure of S, then we write S = Ext(Ti) 
and we will simply say that 5 is a structure of S. Given two vertices vi and V2 of the schema, 
connected with a link (wi,W2) then in the structure there must exist all links {wi,W2) for 
wi G Ext{vi), W2 G Ext{v2)- Such requirement justifies the introduction of a labeling (and 
especially of a true-false labeling) in order to have a reasonable graph-based model. 

Definition 4.4 (Data base). A data base B is a pair (S,5), where T, is a schema and S is 
an extensional structure ofT,. 

The schema describes the conceptual organization of the data, while the data content or 
instantiation of the data base is given by the extensional structure associated to the schema. 
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Figure 2: Example of structure 

It is not hard to notice that, given a schema, there is a one-to-one correspondence between 
structures and extension functions, therefore we wih sometimes use the pair {Ti,Ext) as a 
data base. 

Some prehminary definitions are required for introducing our query language. Given a 
partial function f : A >-^ B (i.e. a function where each element of A can be associated to 
one or none of the elements of B), by Dom{f) we denote the domain of /, that is the set of 
elements x £ A such that f{A) is defined. Let /, g be two partial functions from the set A to 
the set Pow{B). Then f is a restriction of g, denoted hy f < g, if Dom{f) C Dom{g) and 
for every x G Dom{f), f{x) C g{x). Moreover by Im{f) we denote the set obtained as union 
of all images of elements in Dom{f): formally Im{f) = ^xeDom{f)f{x)- 

Definition 4.5 (Instance). Let B = (S, Ext) he a data base. An instance of B is a restriction 
f of Ext such that Dom{f) induces a weakly-connected subgraph o/S. 

The following notations will be used in the rest of the paper. The set X(i3) is the set 



14 



Figure 3: Example of structure with false links 

of all instances of B. Let X be a subset of I{B), then by Im(I), we mean the set of nodes 
of the structure of B that is the union of all images of instances in I, while DomainiX) is 
the union of all domains of instances in X. An element in Im{X) is called a value, while an 
element in DomainiX) is called a name. Then the image of a name x S DomainiX) is the 
subset A of ImiX) such that A = Uf^^xfiix). For a value y G ImiX), the inverse image of y, 
denoted by nameiy), is the name of DomainiX) that is mapped by Ext, to a set containing 
the element y. Similarly, given a set A of values, the inverse image of A is the set names{A) 
which is union of all inverse images of the values in A. 



4.1 The graph algebra 

Our graph data model is proposed as a domain-preserving data base, along the same lines as 
other papers where the expressiveness of the relational algebra is studied [HI |3] , and it gives 
a formal embedding for languages used for the retrieval of graph-structured information [11] . 
The requirement that we are dealing with domain-preserving data bases reflects in the query 
language: in fact we have no operation for creating new elements or modifying the schema 
graph, and all operations must preserve the schema and the original structure. 

The main consequence of the assumption that our model is domain preserving consists in 
the fact that we will deal with a schema which is mapped to an instance through an extensional 
mapping. Therefore there is a complete equivalence between subgraphs of the structure and 
restrictions of the extensional mapping. We are now able to introduce the operations of our 
graph algebra: according to our reasoning above we can describe the operation as over partial 
functions whenever it allows a simpler formulation. 

Definition 4.6 (Addition). Let B = (S,5) he a data base and let /i,/2 G ^(B). The 
Addition of /i and f2, denoted as /i © /2, is the following function over domain Dom{fi). 
The operation is defined only if Domifi) = Dom(/2); 

(/ie/2)(x)=/l(x)U/2(x) 



Definition 4.7 (Product). Let B = 
in XiB). The Product of /i and fi, 
follows: 



(S,5) be a data base, and let /i,/2 be two functions 
denoted as /i © /2 is the instance in X{B) defined as 



(/l®/2)(x) 



fi{x)nf2ix) 

f2{x) 

hix) 



ifxe Domifi) n Domif2)Jiix) D ^(x) / 
if X £ Domif2) — Domifi) 
if X £ Domifi) — Domif2) 



(1) 



undefined otherwise 
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The product is defined only if Dom{fi (^ /2) induces a weakly-connected subgraph ofT,. 

Definition 4.8 (Projection). Let B = (S,5) he a data base. Let f be a function inX{B) and 
let A be a subset of the domain of f , such that A induces in T, a weakly- connected subgraph. 
The projection of / on A, denoted as n^(/), is the instance defined as follows: 

n (f\(^\ - I -^(^^ ifxGA 

''^^J)^'')-\ undefined if x ^ A ^^ 

Definition 4.9 (Difference). Let B = (T,,S) be a data base. Let /i,/2 be two functions in 
I{B) over the same domain A. The difference of /i by /2, denoted as /i /2 is the following 
instance: 

I undefined otherwise 

The difference is defined only if Dom{fi Q /2) induces a weakly- connected subgraph ofT,. 

Since tlie coloring of the edges encodes the fact that a relation between two objects is 
actual or not, it is natural that the query language has some tools for exploiting such coloring. 
In our model we will need to extract instances where "similar" edges are the same color. The 
definition of selector is the first step in such direction. 

Definition 4.10 (Selector). Let T, be the schema of a data base B. Then a selector of S is 
a pair (G, a) consisting of a weakly- connected subgraph G of T, and a coloring a : E ^ T of 
the edges of G. 

Querying for a selector in a data base returns all subgraphs of the structure that are 
isomorphic to the selector: each such subgraph is indeed called a simple instance. Moreover, 
it is natural to define an operation of selection that allows to obtain instances which are 
compatible with a coloring of the schema over the alphabet P. This is the last operation of 
our algebra. 

Definition 4.11 (Simple instance). Let {T,,Ext) be a data base, where S = {V , E' , fi) . Let 
{Gs,cr) be a selector ofT,, where Gg = {V{Gs),E{Gs)). Then a simple instance induced by 
the selector {Gs,cr) is a restriction f of Ext such that Domain{f) = V, \f{v)\ = 1 for each 
V G Domain{f) and ;u(/(x), f{y)) = o-(a;, y) for each (x, y) G E{Gs). 

Notice that all simple instaces have the same domain. 

Definition 4.12 (Selection). Let B = (S,5) be a data base. Let f be a function in T{B) 
and {Gs,cr) a selector. Let T be the set of all simple instances induced by Gg that are also 
subinstances of f. The selection of / by {GgjCr), denoted as f\{Gs,cr), is ®„^jrg. 

5 Stability 

Given a set X of instances, our first aim is to give a characterization of all instances that 
can be obtained with a query that uses only the information contained in the instances in 
X, or equivalently by an expression of the algebra that has only instances in X as operands. 
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In such direction the main result of this section is that expressiveness in our graph algebra 
is equivalent to the conservation of a certain partition. It is natural to associate a notion of 
undistinguishability to a partition, where all elements in a set of the partition are deemed 
undistinguishable. We share the goals of [12], but we have introduced in this paper a new 
framework, that is we are looking for a notion of expressiveness that is coherent with our 
meta theorem. Just as the notion of automorphisms, introduced in J12|H] for relations, gives 
a global description of the logical dependencies among data that must be preserved when 
querying the data base, in our model a partition (or an equivalence relation) will represent 
all such logical dependencies. The equivalence partition over elements of the structure that 
we will study is called stability and is denoted by M-j (where I is an instance). 

The simplest possible form of undifferentiation (called 0-stahility) is based on the idea 
that we are able to distinguish images of different vertices of the instance and vertices of the 
extensional structure belonging to different functions of T. Such notion basically consists of 
using expressions in our algebra that do not contain any selection. 

Definition 5.1. Let A he a subset of the im,age of a set X of instances. Then A is split by 
I iff there is a function fi in X such that A Pi Im{fi) ^ and A — Im{fi) ^ 0. 

We are now able to introduce formally the definition of 0-stability, as follows: 

Definition 5.2 (0-stable). LetX be a set of instances over a data base (S,5), and let A be 
a subset of Im{X). Then A is 0-stable w.r.t. X, if the two following conditions hold: 

1. A CI Im[f{x)), where f & X, and x G Domain[X) is a name of the schema. 

2. for each function f £ X, then A and Im{fi) are disjoint or one is contained into the 
other one. 

A more refined notion of undifferentiation is called 1-stability; informally a set A is 1- 
stable w.r.t. B ii A is 0-stable and B is not able to distinguish two vertices of A with edges 
outgoing from B and ingoing in A (or outgoing from A and ingoing in B). Notice that 1- 
stability is a binary relation over subsets of Im{X), while 0-stability is a unary relation. The 
formal definition is: 

Definition 5.3 (1-stable). LetX be a set of instances over a data base (S,5) and let A and 
B be two disjoint subsets LmiX). Then A is 1-stable w.r.t. B andX, denoted as B M-i^j A 
if the following conditions are verified: 

1. A is 0-stable w.r.t. X; 

2. for each edge (ai, 5i) of S, with ai £ A,bi £ B and for each 02 £ A there exists b2 £ B 
such that ^(ai, 61) = ^(02, 62); 

3. for each edge (bi, ai) of S, with ai £ A,bi £ B and for each 02 £ A there exists b2 £ B 
such that /i(6i, ai) = ^(62, 02)- 

Informally 1-stability means that whenever there is an colored edge (say a red edge) from 
a vertex of ^ to a vertex of B, then all vertices of A have a red edge ingoing in B. In other 
words if we assume that B is undistinguishable, then also A is undistinguishable, by any 
single-edge path. The notion of 1-stability can be further generalized, but first we need a 
new definition. 
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Definition 5.4 (Path). Let G = {V,E) be a labeled graph. Then a colored path in G is a 
pair (p, s) where p =< vi,ei,V2, ■ ■ ■ ,vi-i,ei^i,vi >, and for every I < i < I, Vi belongs to 
V and Cj is an edge of G such that ei = {wi,Wi-^i) or Cj = {wi+i,Wi). Moreover s is the 
sequence < /i(ei), . . . ,/i(e/_i) > where ^{ei) is the color of the edge ei in G. 

Notice that the definition of path used in the paper is different from the one that can be 
usually found in a graph theory textbook, as arcs can also be in the reverse direction. The 
length of a path is the number of edges it contains. Let B = (S,5) be a data base and let 
(p, s) be a colored path of S, with p =< iii, ei, ^2, • • • , en,Vn+i >■ Then the path schema of 
(p, s) is the pair {p' , s), with p' =< v'l, e'^^, i;2, • • • , e'„, v'n+i >i where for every 1 < i < n + 1, 
v'i = Ext-^{vi) and e[Ext-^{ei). 

Definition 5.5. Let x,y be two nodes of the X-structure, and let Z be a subset of Im[X), 
then the path dependencies from x to y in Z, denoted as PDk.z,x{x,y) is the set of path 
schemata of all paths of the X-structure that are starting in x and ending in y and entirely 
contained in Z . 

Informally given x,y, Z, their path dependencies is obtained by removing all vertices not 
in Z, then computing all possible paths from x to y, and finally computing the respective 
path schemata. The next step is to generalize 1-stability to /c-stability, that is taking into 
account length- A: paths instead of simple edges (that is length- 1 paths). 

Definition 5.6 (A;-stable). Let X be a set of instances over a data base (S,5), let k be an 
integer larger than one, and let A and B be two disjoint subsets of Im{X). Then A is fc-stable 
w.r.t. B and X, denoted as B '-^kx ^ if the following conditions are verified: 

1. A is 0-stable w.r.t. X; 

2. A is {k — \)-stable w.r.t. B andX; 

3. for each ai,a2 € A,bi £ B there exists b2 £ B such that PDk^AuB,xiO'i, ^i) ^ PDk,AuB,x{o,2, ^2) 

The main idea is that when B ^^kx A then if B is undistinguishable also A is undis- 
tinguishable when only paths no longer than k are taken into account. Our main definition 
follows: 

Definition 5.7 (Stability). Let X be a set of instances, and let A,B be two disjoint subsets 
of Im(X). Then A is stable w.r.t. B in X, denoted as B ^-t-j A, if B ^^k,x A for all A; € N. 



By Def. 5.7, it is immediate to verify the following properties of stability: 
Lemma 5.1. Let X be a set of instances, and let A,B,C C^ Im[X), then: 

1. if B U C ^^x A and names{B) n names{C) = 0, then B "^-s-j A and G ^-s-x A, 

2. if naraes{B) = names{G) = {x}, B ^->j A and G ^->x A, then B U G ^^i A, 

3. ifnames{B) = names{G) = {x}, BnG / 0, ^ ^^x B and A ^^x C, then A ^x BuG. 
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Stability is a relation between disjoint subsets of the domain. The definition of expres- 
siveness in the query language that we want to obtain is based on partitions, and now we are 
able to introduce the class of partitions we are interested into. A partition V of nodes of the 
structure is called valid if and only if for each set A of the partition and every set B that is 
a union of sets of V, then B cannot differentiate A. 

Definition 5.8. Let X he a set of instances. A partition V = {Pi, . . . , Pfe} of Im(X) is valid 
if for every Pi £ V, L C {I, . . . ,k}, L ^ 9, i ^ L, then U^ei Pi ^x Pi- 
Given a set I of instances, then there may be various valid partitions of Im{X) , and at least 
one valid partition always exists (the partition where each vertex of the extentional structure 
is a set). Some valid partitions are more representative of the actual undifferentiation, in fact 
we will assume as a measure of the undifferentiation induced by X the coarsest valid partition, 
which we will call canonical partition and denote as Cj. We can show that the definition of 
canonical partition is well-formed. 

Theorem 5.1. Every set I of instances has a unique canonical partition Cj. 

Proof Clearly the partition of ImiX) into singletons is a valid partition, so there exists at 
least one canonical partition. Now assume to the contrary that there exist two coarsest valid 
partitions Vi and 1^2- Let Ri and R2 be the equivalence relations induced by the partitions 
Vi and ^2) respectively. Let R* be the transitive closure of the relation R defined as follows: 
xRy if and only if x and y are in the same set of Vi or V2 ■ We can prove that the partition V 
induced by R* is a valid one of index strictly less than k. By construction of R* each set of 
P is a union of sets in Vi and also a union of sets in V2, moreover each set in V is contained 
in the image of a single name (since each set must be 0-stable). Notice that R* 7^ Vi iff 
Vi / V2. 

Let Xi be a set of V, and let Zk be a class of Vi contained in Xj. Since Vi is a valid 



partition, and Xi is a union of disjoint sets of "Pi, by Lemmata 5.7 5.1 , we have that Zi ^^x Xi 
and Xi ^-7>x Zi. Let Xi, Xj be two sets of V, with Xi = Zi^ U • • • Zi^. and Xj = Zj^ L) ■ ■ ■ Zi^. 
We have already proved that Zj M-j Xi and Xi M-j Zj, applying again Lemmata 5.7 5.1 



and noting that Xj = Zj^ Li ■ ■ ■ Zi^ we obtain Xj "^-t-j Xi, By the generality of Xi and Xj the 
partition V is valid. Moreover V is coarser than Vi, which is a contradiction. D 

6 Expressiveness 

In this section we will prove our main result regarding the expressiveness of the graph-based 
query language by showing that a function can be computed if and only if adding such 
function does not change the canonical partition. In the following we will assume that the 
union of the images of all functions in I is exactly the universe set; such assumption does 
not violate the generality since otherwise we would simply have some sets of the canonical 
partition whose union consists of exactly those elements of the universe set that are not in 
any function in I. 

Theorem 6.1. Let BI(I) be the set of functions that are a result of an expression of the 
graph algebra where only functions of I are operand. Then f € BI{X) if and only if the 
canonical partition induced by X is equal to the canonical partition induced by I U {f}, that 
is Cx = Cxu{f} ■ 
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The following two properties, that are consequences of Def. 5.8, will be useful to prove 
the main result of the paper. 

Proposition 6.2. Let I be a set of instances and let V be a valid partition. Then the image 
of every instance g £ BI(I) is the union of sets ofV. 

Proof. We prove the lemma by induction on the number n of operations of the expression for 
g. If n = 0, then ^r is a function in X. Since all sets of a valid partition are 0-stable, no set of 
a valid partition has both an element in Im{g) and an element not in Im{g), therefore the 
union of all sets of V that are contained in Im{g) is contained in Im{g). To prove that such 
containment is not strict (i.e. such union is equal to Im{g)) it is sufficient to note that all 
elements of Im{g) belong to some set of V. 

Assume now that n > and g is obtained by the application of an operation to two 
expressions /i and /2 in BI{I), or one expression for a selection. Clearly, by inductive 
hypothesis the images of /i and /2 are obtained as the union of some sets oi V. It is 
immediate to verify the lemma for the case that g = /i ® /2, g = /i ^ /2, g = /i /2 
and g = YiA{fi)- Finally, assume that g = fi\Gs, where Gs is a selector. By definition of 
the selection, then Im{g) is the union of the images of all simple instances induced by Gs- 
Assume to the contrary that there exists a set A of the valid partition such that A ^ Im{g) 
and A n Im{g) ^ 0. Now, let y G A — Im{g) and x G vl Pi Im{g). Hence, by construction 
of selection, y cannot be in the image of any simple instance induced by G^, while x is 
contained in the image of a simple instance induced by Gs- By inductive hypothesis Im{fi) 
is union of sets of V, moreover since A is a set of V, also Im{fi) — A\s union of sets of the 
valid partition, implying that Im{fi) — A ^^x A., It follows that, for each z £ Im{fi) — A, 
PDjjn(fj^);rix,z) C PDj^(^f^^x{yjv) for some v G Im{fi) — A. This implies that there is 
a simple instance induced by Gs that has in its image y, which is a contradiction with the 
above assumption. Consequently, the image of g must be union of sets of V. D 

We will prove that an alternative characterization of canonical partition is as the partition 
induced by the equivalence relation Rx between elements of Im{I), where xRxy if and only 
if for every instance / G BI(I), x G Im[f) <^ y G Im{f). In the following of the paper 
let P-^^ denote the partition induced by the equivalence relation Rx- Successively we will 
prove that a function / belongs to BI{X) if and only if Im{f) can be obtained as union of 
sets of P-x : completing the proof of our main result, in two steps. First we will prove that a 
function / belongs to BliX) if and only if Im{f) is union of sets in Px^ , then we will prove 
that P-f = Cx- The following proposition is an immediate consequence of the definitions of 
xRxy and of projection. 

Proposition 6.3. Let x,y £ Lm(I) such that xRxy. Then both x and y belong to the set 
Ext{z) for some name z. 

Corollary 6.4. Let P^^ be the partition induced by a set X of instances. Then, the inverse 
image of every set of the partition consists of a single vertex. 

Lemma 6.1. Let I be a set of instances over B, let P^^ be the partition induced by X, and 
let A G ij . Then there exists an instance f G BliX) such that Lm{f) = A. 
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Proof. Let J- be the set of functions / G BI[X) such that Im{f) C A. By Cor. 6.4 all 
functions in T have the same domain, therefore the expression <? = © f^jr f is well- formed; 
by construction Im{g) C A. By definition of Pj all functions / G BliX) are such that 
Im{f) C A or Im{g) Ci A = (/>, therefore all functions / G BliX) whose image intersect A 
are such that Im{f) C A, which in turn implies that are also in J^. Hence Im{g) = A, for 
otherwise there would be an element of A not belonging to the image of any function in X. D 

Lemma 6.2. Let X be a set of instances over B, let Pj he the partition induced by X and 
let A be a union of sets in P^^, such that the inverse image of A induces a weakly- connected 
subgraph of the schema, then A is the image of an instance f £ BI[X). 



Proof. Let Ai,- ■ ■ , An be the sets of Pj whose union is A, and notice that, by Lemma 6.1 
it is possible to associate to each set Ai the instance fi G BliX) whose image is Ai, moreover 
for each such /j, \Dom[fi)\ = 1. For each vertex x in the inverse image of A we can construct 
the function g^ as ®Dom{f)={x} fi- Then let g = <S) 9x] it is immediate to note that Im{g) = 
A. ' D 



An immediate consequence of Lemmata 6.2 and 6.2 is the following: 



Corollary 6.5. LetX be a set of instances over B and let / G X. Then f G BI(X) if and only 
if Im{f) is union of sets in P^^ and the inverse image of Im{f) induces a weakly- connected 
subgraph of the schema. 



With Lemma 6.2 we have proved that all interesting unions of sets of the partition P^ 
can be obtained with an expression of the graph algebra where all operands are taken from 
X, therefore Pj conveys all expressibility information. But Pj is defined on the set BliX), 
we still need to correlate the definition of canonical partition with that of P^^. 

Lemma 6.3. Let X be a set of instances. Then Pj is a valid partition ofX. 

Proof. Let ^, P be two disjoint sets where A G P-f and B is union of sets in Pj , we will 
prove that B M-j A. First of all we will show that A is 0-stable. Remember that, by definition 
of Pf-^, for each / G BI{X) either Im{f) D A or Im{f) n A = 0. Since BI{X) contains X, 
it is immediate to not that A is 0-stable. In the following let a be the single-vertex inverse 
image of A. 

If the inverse image oi A\JB does not induce a weakly-connected subgraph of the instance, 
then 0-stability of A suffices to prove that B "^j A, therefore assume that the inverse image 
of ^ U P induces a weakly-connected subgraph of the instance. Let us assume that B M-x A 
does not hold, then we will get a contradiction. Without loss of generality we can assume 
that P is a minimum set for which B M-j A does not hold. The new assumption implies 
that fc-stability does not hold for some k. It follows that there are two elements x,y € A 
and an element z G B such that PDaub,x{x, z) is not contained in PDAuB,x(,y,v), for every 



f G P. Now, by Lemma 6.2, there is an instance g G BliX) such that Im{g) = AU B. Let 
name{x) = xi and name{z) = z\ and let t; G P such that name{v) = zi. Let us consider the 
function h = ©c^gp^), (x z) (^a{g\Gs))- By construction x belongs to Im{h), but y does 
not; since h G BI{X) we have found a function in BI{X) containing x but not y, contradicting 
the assumption that xRjy. D 
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Lemma 6.4. Let X he a set of instances. Then P-f = Cj. 



Proof. By Lemma 6.3 P-f must be a valid partition of X. By Lemma 6.1, every set A G P-£ 



yBI 



is obtained as the image of some instance / G BI{X). Clearly, since / G BI{X), by Lemma 6.2 
the image A of / is the union of sets of the canonical partition of X. Hence Pj = Cj. D 



Theorem 6.6. Let X he a set of instances. Then an instance g helongs to BI{X) if and only 
„■/■ pBI _ pBI 

Proof. Clearly by Lemma 6.4 it suffices to show that g G BliX) if and only if ij = Px\jin\^ 
moreover it is immediate to note that, by construction of ij , if 5 G BL{X) then Pj — 



P^|r 1. Assume now that P^^ = PS\s \- By Lemma 6.3 P?,\s \ must be a valid partition. 



XVj{gy — --^-= — .X.C.U .J - . ^^^^y 



6.2 



2:u{9} 



and exploiting the assumption that P-^^ = Pj^Jiol' ^°'^ each j; G Dom{g), 
)e the union of some sets in P■^^^t i, that is Im{g{x)) = UAj, for some sets 



By Lemma 

g{x) must 

^j G Px U {g^^^ . But by Lemma 6.1, for each set Aj there is an instance fj G BL{X) such 

that >lj is the image of fj. Consequently, g{x) = ®fi, and hence g = <^xeDom{g)9ix), which 

proves that g G BI{X) as required. D 

Theorem 16.61 and Lemma 16.41 lead to our main result. 

Corollary 6.7. Let X he a set of instances. Then an instance g helongs to BL{X) if and only 
if C-x = Cxu{g} ■ 

7 Conclusions 

We have introduced the idea that partitions of the domain set can be used for characterizing 
the set of relations or graphs that can be extracted in a data base in the relational or in 
a graph-based model. By formally proving those expressiveness results we have effectively 
given a new framework for the analysis of data base query languages. 

The graph-based model presented here is not rich enough to be considered of practical 
use, therefore it would be interesting to use our framework for analyzing a more sofisticated 
graph-based model. 
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